14 research outputs found
Latent Emission-Augmented Perspective-Taking (LEAPT) for Human-Robot Interaction
Perspective-taking is the ability to perceive or understand a situation or
concept from another individual's point of view, and is crucial in daily human
interactions. Enabling robots to perform perspective-taking remains an unsolved
problem; existing approaches that use deterministic or handcrafted methods are
unable to accurately account for uncertainty in partially-observable settings.
This work proposes to address this limitation via a deep world model that
enables a robot to perform both perception and conceptual perspective taking,
i.e., the robot is able to infer what a human sees and believes. The key
innovation is a decomposed multi-modal latent state space model able to
generate and augment fictitious observations/emissions. Optimizing the ELBO
that arises from this probabilistic graphical model enables the learning of
uncertainty in latent space, which facilitates uncertainty estimation from
high-dimensional observations. We tasked our model to predict human
observations and beliefs on three partially-observable HRI tasks. Experiments
show that our method significantly outperforms existing baselines and is able
to infer visual observations available to other agent and their internal
beliefs